master algorithm
- North America > United States > California (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada (0.04)
- North America > Canada > Alberta (0.14)
- North America > United States > Massachusetts (0.04)
- North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
- (2 more...)
Model Selection in Contextual Stochastic Bandit Problems
We study bandit model selection in stochastic environments. Our approach relies on a master algorithm that selects between candidate base algorithms. We develop a master-base algorithm abstraction that can work with general classes of base algorithms and different type of adversarial master algorithms. Our methods rely on a novel and generic smoothing transformation for bandit algorithms that permits us to obtain optimal $O(\sqrt{T})$ model selection guarantees for stochastic contextual bandit problems as long as the optimal base algorithm satisfies a high probability regret guarantee. We show through a lower bound that even when one of the base algorithms has $O(\log T)$ regret, in general it is impossible to get better than $\Omega(\sqrt{T})$ regret in model selection, even asymptotically.
Playing the Player: A Heuristic Framework for Adaptive Poker AI
Paterson, Andrew, Sanders, Carl
For years, the discourse around poker AI has been dominated by the concept of solvers and the pursuit of unexploitable, machine-perfect play. This paper challenges that orthodoxy. It presents Patrick, an AI built on the contrary philosophy: that the path to victory lies not in being unexploitable, but in being maximally exploitative. Patrick's architecture is a purpose-built engine for understanding and attacking the flawed, psychological, and often irrational nature of human opponents. Through detailed analysis of its design, its novel prediction-anchored learning method, and its profitable performance in a 64,267-hand trial, this paper makes the case that the solved myth is a distraction from the real, far more interesting challenge: creating AI that can master the art of human imperfection.
- Research Report > New Finding (0.67)
- Research Report > Experimental Study (0.46)
Supplement to " Model Selection in Contextual Stochastic Bandit Problems "
In Section D we present the proofs for Section 5.1 In Section H we show the proofs of the lower bounds in Section 6. We outline briefly some other direct applications of our results. CORRAL will achieve regret O ( p | L | dT) . B.1 Original Corral The original Corral algorithm [2] is reproduced below. We reproduce the EXP3.P algorithm (Figure 3.1 in [ 's expected replay regret satisfies: Therefore total regret is bounded by 6 U ( T,) log( T) D.2 Applications of Proposition 5.1 We now show that several algorithms are ( U,, T) bounded: Lemma D.2.
- North America > Canada > Alberta (0.14)
- North America > United States > Massachusetts (0.04)
- North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.48)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada (0.04)
- North America > United States > California (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)